05:58
2026-05-30
lesswrong.com
ai-safety
Belief manifolds, and how to steer along them
A BlueDot Technical AI Safety Project researcher reproduced a study from Goodfire demonstrating that language model representations form curved geometric manifolds, not simple linear directions. The wโฆ